Dissertation title : Discriminative Interlingual Representations
نویسندگان
چکیده
Dissertation title : Discriminative Interlingual Representations Jagadeesh Jagarlamudi, Doctor of Philosophy, 2013 Dissertation advised by: Hal Daumé III Department of Computer Science The language barrier in many multilingual natural language processing (NLP) tasks can be overcome bymapping objects from different languages (“views”) into a common low-dimensional subspace. For example, the name transliteration task involves mapping bilingual names andword translation mining involves mapping bilingual words into a common low-dimensional subspace. Multi-view models learn such a low-dimensional subspace using a training corpus of paired objects, e.g., names written in different languages, represented as feature vectors. The central idea of my dissertation is to learn low-dimensional subspaces (or interlingual representations) that are effective for various multilingual and monolingual NLP tasks. First, I demonstrate the effectiveness of interlingual representations in mining bilingual word translations, and then proceed to developing models for diverse situations that often arise in NLP tasks. In particular, I design models for the following problem settings: 1) when there are more than two views but we only have training data from a single pivot view into each of the remaining views 2) when an object from one view is associated with a ranked list of objects from another view, and finally 3) when the underlying objects have rich structure, such as a tree. These problem settings arise often in real world applications. I choose a canonical task for each of the settings and compare my model with existing state-of-the-art baseline systems. I provide empirical evidence for the first two models on multilingual name transliteration and reranking for the part-of-speech tagging tasks, respectively. For the third problem setting, I experiment with the task of re-scoring target language word translations based on the source word’s context. The model proposed for this problem builds on the ideas proposed in the previous models and, hence, leads to a natural conclusion. Discriminative Interlingual Representations
منابع مشابه
Discriminative models for robust image classification
A variety of real-world tasks involve the classification of images into pre-determined categories. Designing image classification algorithms that exhibit robustness to acquisition noise and image distortions, particularly when the available training data are insufficient to learn accurate models, is a significant challenge. This dissertation explores the development of discriminative models for...
متن کاملEDR’s Concept Classification and Description for Interlingual Representation
This paper describes the outline of the EDR Concept Dictionary and gives some examples of interlingual representations as the semantic representations for an input sentence.
متن کاملFrom Bilingual Dictionaries to Interlingual Document Representations
Mapping documents into an interlingual representation can help bridge the language barrier of a cross-lingual corpus. Previous approaches use aligned documents as training data to learn an interlingual representation, making them sensitive to the domain of the training data. In this paper, we learn an interlingual representation in an unsupervised manner using only a bilingual dictionary. We fi...
متن کاملInterlingual Indexing across Different Languages
We present two methods for automatic indexing, which are based on an interlingual layer of content description. In the first approach, we acquire indexing patterns from English documents by statistically relating interlingual representations of English documents (based on text token bigrams) to their associated
متن کاملLearning from Multiple Views of Data
Title of dissertation: LEARNING FROM MULTIPLE VIEWS OF DATA Abhishek Sharma, Doctor of Philosophy, 2015 Proposal directed by: Professor David W. Jacobs Department of Computer Science This dissertation takes inspiration from the abilities of our brain to extract information and learn from multiple sources of data and try to mimic this ability for some practical problems. It explores the hypothes...
متن کامل